Abstract
The goal of this project was to create statistical and machine learning models for equipment sensor data from a semiconductor fab to predict wafer pass/fail functionality as a means to improve fab yield. The models explored were logistic regression, modified logistic regression and random forest. The best predictive model using the fewest number of sensor features was found to be iterated logistic regression with class balanced data using a SMOTE method.| Model | TPR | TNR | Accuracy | F1.Score |
|---|---|---|---|---|
| logistic | 0.8630 | 0.8 | 0.8590 | 0.9197 |
| iterated logistic | 0.4795 | 1.0 | 0.5128 | 0.6481 |
| iterated logistic (SMOTE) | 0.8630 | 0.8 | 0.8590 | 0.9197 |
| random forest | 0.7808 | 0.8 | 0.7821 | 0.8702 |
Acknowledgements
Thank you to the SpringBoard team for providing a great course and learning framework to build off of. Special thank you to my mentor, Blaine Bateman, without whom I would not have learned so much in so little time.
In this age of electronics, a major driving force is the supply of low cost semiconductor chips. Chip manufacturing occurs in facilities called wafer fabs in which silicon wafers up to widths of 300mm are subjected to a series of processing steps that implant, deposit and etch the microelectronic circuits onto the wafer surface in a 3-dimensional construct. Each of these process steps is performed by various tools that are fitted with sensors to measure their specific operation. Traditionally these sensor data are used by the equipment operator to detect malfunctions as close to real time as possible. If an error condition is detected the operator can quickly make adjustments or stop the process to spare corrupting the existing wafer product. Controlling costs associated with material waste is a major concern for fabs. The measure of overall wafer throughput is called the fab yield (yield) and is calculated as the ratio of the number of functional wafers output from the fab divided by the total number of wafers started. The goal of the fab is to maximize yield by reducing waste by any means. One way to do this is better utilize the equipment sensor data to identify the key instrument sensors of a semiconductor manufacturing line and model the combined alarm conditions for potential wafer failures. This capability would be sought out by any semiconductor manufacturer who wants to maximize their yield. Currently wafers are tested after specific process levels are completed. Much of the critical testing can’t be performed until very far along in the manufacturing process. Thus a failure early in the process will consume costly resources as the wafer continues onto other fabrication steps, only to be scrapped at the end. If statistical sampling is used at final test then there’s also the added risk of failing chips shipping to customers. Having the ability to catch failures during any process step minimizes the chance of failure propagation and improves isolation time of equipment issues thereby greatly improving yield and reducing costs.
The data used for this project will be the SECOM Data Set which is publicly available from the UCI archive. The dataset is comprised of 1567 observations, one observation per wafer fabricated, 591 variables corresponding to various sensors in fabrication equipment, with 104 failing observations total.
The SECOM Data Set consists of two csv files. The first is a list of pass/fail and date/time results, one entry per wafer run, and the other contains the corresponding numeric results of sensor readings from a semiconductor manufacturing line. The data files do not contain a header so variable names ‘Status’, ‘Date’, and ‘Time’ were assigned for pass = -1 / fail = +1, date and time. The ‘Status’ variable is then updated to pass = 0 / fail = +1 for convenience when fitting the data to categorical models. The remaining variables assumed default names: V1, V2, etc. Table 2.1 below shows a sampling of the initial dataset. Since the sensor variables are not named in the original dataset, and since no information is provided about the physical meaning, source or process order of each sensor’s data, there is no way to attribute any process or business meaning to the data. Therefore the analysis herein will take a “black box” approach.
| Status | Date | Time | V1 | V2 | V3 | V4 |
|---|---|---|---|---|---|---|
| 0 | 19/07/2008 | 11:55:00 | 3030.93 | 2564.00 | 2187.733 | 1411.1265 |
| 0 | 19/07/2008 | 12:32:00 | 3095.78 | 2465.14 | 2230.422 | 1463.6606 |
| 1 | 19/07/2008 | 13:17:00 | 2932.61 | 2559.94 | 2186.411 | 1698.0172 |
| 0 | 19/07/2008 | 14:43:00 | 2988.72 | 2479.90 | 2199.033 | 909.7926 |
| 0 | 19/07/2008 | 15:22:00 | 3032.24 | 2502.87 | 2233.367 | 1326.5200 |
Sensor data comprise real valued random variables by nature. So any variable that contains only missing data or has no variation is irrelevant for this analysis and can be dropped. The approach taken here was to drop all variables where the distribution min = max. It’s not clear why these data were included in the SECOM dataset, but since the goal is to identify signals or combinations of signals leading to an alarm condition, unvarying sensor data are irrelevant. The next important issue with the dataset was to properly classify all missings as “NA”. Missing results can be defined by a number of non-standard labels including “N/A”, “missing”, “na” or even " “. The naniar package provides a simple function replace_with_na_all() to simplify converting this arbitrary list of labels to”NA“. Finally, there were a number of”NaN" designations that aren’t typically interpreted as missings, but since the sensor data should be real values it was determined that these values should be treated as “NA”. It was found that initially 5.59% of the dataset was missing. While that doesn’t seem to be too significantly large, it depends on how missingness is distributed within the dataset. Among the many useful features of the naniar package are plotting routines for visually exploring missingness. One of the routines gg_miss_var() is shown below in Fig. 2.1 in which the variables are ordered by total missingness and plotted on the y-axis, and the number of missing observations on the x-axis. The number variables in this dataset is too large for printing so are omitted from the y-axis. The notable takeaway here was that most of the missing data was limited to a relatively few number of variables. The safe approach taken was to drop all variables with > 10% missing data leaving just 1.63% total missing data for imputation.
Fig. 2.1: Visualization of initial missingness.
There are several R packages for imputing data. Initially, the simputation package was chosen for it’s ease of use and integration with naniar and ggplot2. Unfortunately, the number of variables in this dataset created multiple run-time issues for the simputation engine so it had to be abandoned. Instead, the mice package, which stands for “Multivariate Imputation by Chained Equations”, provided powerful fitting functionality at a moderate computation cost. The package is capable of fitting a different imputation model to each variable, but the norm.nob method was applied unilaterally and found to return reasonably good values on comparing pre- and post-imputation distributions. Fig. 2.2 shows a sample distribution for two model features overlaying the imputed values in the histogram. Fig. 2.3 is a scatterplot of these two features demonstrating how pairwise distribution relationships are maintained by imputed data. A summary of the data wrangling and imputation effort is shown in Table 2.2. Of the initial 593 total variables in the dataset, the post-wrangling count was 414, and the initial 4.51% total missing data has been reduced to zero.
Fig. 2.2: Example distributions before and after imputation
Fig. 2.3: Example scatterplot overlay of imputed and original data. Imputed data is shown standalone on the right.
| Metric | Initial | Final |
|---|---|---|
| # of Variables | 593.00 | 417 |
| # of Observations | 1567.00 | 1567 |
| % Missings | 4.51 | 0 |
An important requirement for developing a multivariate model for this project is how well the individual sensor data distributions can be modeled by a known statistical distribution type. Ideally each variable would follow the well known normal distribution. The figure below shows an example of one variable in the SECOM dataset that is roughly normal based on visual inspection of the probability density (PDF) distribution. Included below that is its corresponding Q-Q plot, or quantile plot, which plots the measured versus theoretical quantile data. For an ideal normal distribution the fit line of a Q-Q plot would be colinear with the data and have very small residuals over the entire +3/-3 z-score range. The farther the fit line deviates from data the less confident we can be that the distribution is normal. A reasonable target is that at least 95% of the data fits the distribution which corresponds to a good fit between z = +/-1.96. Since the goal of the model in this project is to predict physical wafer failures based on production sensor data, the PDF and Q-Q plots are also shown comparing passing and failing results. Except for differences in the tails, there’s very little distinction between the distributions. Since it’s not known if the differences in the tail data is important or not, the apparent outliers will not be removed from the analysis yet. The second set of plots in Fig. 3.2 below show the data for another sensor which also follows a normal distribution, but comparison of yield results shows differences in both the peak probability and tail distribution.
Fig. 3.1: Example of near normal distributions.
Fig. 3.2: Example of near normal distributions with yield differences.
What happens then for data that do not exhibit a normal distribution? The figure below shows the case for a right-skewed distribution at the top left with corresponding Q-Q plot below it. The longer right tail is clear in the PDF but really stands out in the Q-Q plot where the trend sharply deviates near z = +1. For distributions like this one the plan is to transform the original data into a form that is closer to normal. The plots on the right side show the result after taking the logarithm of the sensor values. The effect is seen as rebalancing the distribution, making it more symmetric about the mean. There are many non-normal distribution types available for custom fitting these data, but the approach that will be taken here will be to apply logarithmic transform, or shift + transform. The problematic situation is when the data do not follow any single distribution type but is comprised of a superposition of two or more component distributions. Fig. 3.4 below shows examples of multi-modal sensor distributions. This data could be fit with a superposition of distributions, but review of the Q-Q plots show that a normal distribution can describe the overall distribution adequately. In this project then, multi-modal distributions will be approximated by a normal or log-normal distribution as best fits the data.
Fig. 3.3: Example skewed distribution.
Fig. 3.4: Example multi-modal distributions.
A scatter plot is good way to assess relationships between pairs of variables. The more that the data trends with a positive slope the more the two variables are correlated. On the other hand, the more the data trends with a negative slope the more the two variables are anti-correlated. When building a model with a few variables this is a convenient way to visually identify the relevant features. The example scatter plot matrix below shows the relationships between the first ten variables in the SECOM dataset plus the dependent variable ‘Status’. To quantify the relationship between two variables a correlation analysis is run and generates a correlation coefficient between -1 and +1. Correlated data obtain a coefficient > 0 up to a maximum of 1 and anti-correlated data obtain a coefficient < 0 down to a minimum of -1. Fig 3.6 shows examples of highly correlated (top), highly anti-correlated (bottom) and uncorrelated (middle) variable pairs in the SECOM dataset. Table 3.1 below that lists the same information in tabular format.
Fig. 3.5: Example scatter plot matrix.
Fig. 3.6: Sample correlation coefficient ranges for the SECOM dataset.
| Range | var1 | var2 | coeff |
|---|---|---|---|
| bottom | V35 | V37 | -0.9993 |
| bottom | V94 | V107 | -0.9916 |
| bottom | V100 | V105 | -0.9902 |
| bottom | V93 | V106 | -0.9892 |
| bottom | V95 | V97 | -0.9565 |
| bottom | V97 | V99 | -0.8713 |
| bottom | V117 | V524 | -0.8580 |
| bottom | V117 | V252 | -0.8544 |
| bottom | V117 | V390 | -0.8538 |
| bottom | V123 | V131 | -0.8321 |
| middle | V305 | V584 | 0.0000 |
| middle | V279 | V301 | 0.0000 |
| middle | V274 | V305 | 0.0000 |
| middle | V304 | V410 | 0.0000 |
| middle | V85 | V484 | 0.0000 |
| middle | V165 | V477 | 0.0000 |
| middle | V2 | V527 | 0.0000 |
| middle | V132 | V226 | 0.0000 |
| middle | V132 | V304 | 0.0000 |
| middle | V24 | V220 | 0.0000 |
| top | V154 | V427 | 0.9992 |
| top | V255 | V527 | 0.9993 |
| top | V252 | V524 | 0.9993 |
| top | V178 | V449 | 0.9995 |
| top | V308 | V310 | 0.9996 |
| top | V157 | V430 | 0.9997 |
| top | V223 | V495 | 0.9998 |
| top | V249 | V521 | 0.9998 |
| top | V177 | V448 | 0.9998 |
| top | V173 | V175 | 1.0000 |
A correlogram is another useful visualization tool for spotting trends in variable relationships. The correlogram reduces the information in the scatter plot matrix to a color-coded matrix of correlation coefficients for easy identification of trends. The relationships for the first 10 variable plus the dependent variable are shown below in Fig. 3.7. In a dataset with hundreds of variables these visual aids aren’t very useful. Fig. 3.8 below shows the case for the current SECOM dataset. The plot does show interesting clustering trends but this amount of data is too cumbersome to perform visual analyses. The approach that will be taken in this project is to select features based on their statistical significance in a given model.
Fig. 3.7: Example correlogram plot.
Fig. 3.8: Correlogram plot for the SECOM dataset.
While the number of cross-correlations makes visual analysis cumbersome in general, it is still interesting to review how each variable is correlated to the dependent variable, ‘Status’. Fig. 3.9 plots the SECOM variables ordered by their correlation coefficient with ‘Status’. The notable takeaway is that none of the individual variables is significantly correlated with the output response meaning that the final model should expect to retain a significant number of independent variables.
Fig. 3.9: Correlation coefficients for the dependent variable, Status.
The data approach taken here was to split the available data into 3 parts for initial model training (85%), model tuning (10%) and final test verification (5%). Table 4.1 below shows the breakdown of available data by number of observations and percentage of the total available dataset, and the number of pass/fail observations for the datasets used here. The greater than 10:1 pass:fail delta in these datasets is notable and addressed further into the analysis.
| Observations | Percent | Pass | Fail | |
|---|---|---|---|---|
| All Data | 1567 | 100.00 | 1463 | 104 |
| Train | 1332 | 85.00 | 1244 | 88 |
| Tune | 157 | 10.02 | 146 | 11 |
| Test | 78 | 4.98 | 73 | 5 |
For this initial baseline model all available features are used for a logistic regression. The glm model was unable to converge consistently so the bayesglm logistic model was chosen instead. The bayesglm model uses the Student-t distribution instead of the normal distribution which better describes the SECOM variable distributions in this work. Due to the number of available features the fitting results are lengthy so are shown in Appendix B. The Akaike Information Criterion (AIC) is used as an estimator of the relative fit quality when comparing model fits for a given dataset. The criteria quantifies the model quality by weighing the goodness of fit with the number of estimators used. The few estimators needed for a given fit, the lower the relative number and the better the model. AIC says nothing about the absolute quality of a model, but the model’s low AIC score of 949 does indicate a parsimonious model. The confusion matrix results below show the fit accuracy = 0.9925, sensitivity (TPR) = 0.9992 and selectivity (TNR) = 0.8977 with an F1 score = 0.996. Since the purpose of the model is to accurately predict failures, TPR is a primary measure of the model capability. Accuracy and F1 score follow in assessing the overall model performance. Overall the base training model is very good.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1243 9
## 1 1 79
##
## Accuracy : 0.9925
## 95% CI : (0.9862, 0.9964)
## No Information Rate : 0.9339
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.9365
## Mcnemar's Test P-Value : 0.02686
##
## Sensitivity : 0.9992
## Specificity : 0.8977
## Pos Pred Value : 0.9928
## Neg Pred Value : 0.9875
## Prevalence : 0.9339
## Detection Rate : 0.9332
## Detection Prevalence : 0.9399
## Balanced Accuracy : 0.9485
##
## 'Positive' Class : 0
##
After running the tuning data against the model the accuracy drops to 0.8917, TPR to 0.9178 and F1 score to 0.9404. Overall this is okay but TNR is very low.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 134 5
## 1 12 6
##
## Accuracy : 0.8917
## 95% CI : (0.8323, 0.9356)
## No Information Rate : 0.9299
## P-Value [Acc > NIR] : 0.9728
##
## Kappa : 0.358
## Mcnemar's Test P-Value : 0.1456
##
## Sensitivity : 0.9178
## Specificity : 0.5455
## Pos Pred Value : 0.9640
## Neg Pred Value : 0.3333
## Prevalence : 0.9299
## Detection Rate : 0.8535
## Detection Prevalence : 0.8854
## Balanced Accuracy : 0.7316
##
## 'Positive' Class : 0
##
The area under the receiver operating characteristic (ROC) curve, or area under the curve (AUC), illustrates the diagnostic ability of a binary classifier model as its discrimination threshold is varied between 0 and 1. The curve is a plot of the true positive rate (TPR, sensitivity) against the false positive rate (1 - TNR, 1 - specificity) with the range of AUC between 0.5 and 1. A score of 0.5 corresponds to having little discernability or little predictability better than chance and is represented by a 45 degree line. A score of 1 corresponds to perfect discernability and is represented by a curve that tents to the upper left-hand corner. For this model AUC = 0.7889. The ROC curve for the tuning dataset is plotted below.
Fig. 4.1: Receiver operating characteristic (ROC) curve for baseline logistic model fit.
Fig. 4.2: TPR, TNR, accuracy and F1 score vs. threshold.
Fig. 4.2 plots the model performance versus threshold. Based on the criteria for high TNR and high accuracy, the intersection of these two curves is chosen as the optimum threshold. Setting the threshold to 0.0174 increases TNR to 0.8182, but decreases accuracy to 0.7898 and TPR to 0.7877. F1 score = 0.8745.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 115 2
## 1 31 9
##
## Accuracy : 0.7898
## 95% CI : (0.7177, 0.8507)
## No Information Rate : 0.9299
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.273
## Mcnemar's Test P-Value : 1.093e-06
##
## Sensitivity : 0.7877
## Specificity : 0.8182
## Pos Pred Value : 0.9829
## Neg Pred Value : 0.2250
## Prevalence : 0.9299
## Detection Rate : 0.7325
## Detection Prevalence : 0.7452
## Balanced Accuracy : 0.8029
##
## 'Positive' Class : 0
##
Test data results below show accuracy = 0.859, TPR = 0.863 and TNR = 0.8. F1 score = 0.9197.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 63 1
## 1 10 4
##
## Accuracy : 0.859
## 95% CI : (0.7617, 0.9274)
## No Information Rate : 0.9359
## P-Value [Acc > NIR] : 0.99604
##
## Kappa : 0.3607
## Mcnemar's Test P-Value : 0.01586
##
## Sensitivity : 0.8630
## Specificity : 0.8000
## Pos Pred Value : 0.9844
## Neg Pred Value : 0.2857
## Prevalence : 0.9359
## Detection Rate : 0.8077
## Detection Prevalence : 0.8205
## Balanced Accuracy : 0.8315
##
## 'Positive' Class : 0
##
| Model | Stage | TPR | TNR | Accuracy | F1-Score |
|---|---|---|---|---|---|
| logistic | train | 0.9992 | 0.8977 | 0.9925 | 0.9960 |
| logistic | tune | 0.7877 | 0.8182 | 0.7898 | 0.8745 |
| logistic | test | 0.8630 | 0.8000 | 0.8590 | 0.9197 |
Fig. 4.3: TPR, TNR, accuracy and F1 score vs. model fit stage
In the original logistic fit many coefficients showed very little significance, meaning their significance values were > 0.05. In the following fitting approach the logistic regression is performed iteratively with the least significant variable removed after each iteration. This assures that only the 28 most significant features remain. Please refer to Appendix B for model output details. This approach shows a significant improvement in AIC score of 515. The accuracy = 0.9234, F1 score = 0.9601 and TPR = 0.9855 are also very good, but TNR = 0.0455 has dropped significantly compared with the base model. Confusion matrix results and a plot of most to least significant model features are shown below.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1226 84
## 1 18 4
##
## Accuracy : 0.9234
## 95% CI : (0.9078, 0.9371)
## No Information Rate : 0.9339
## P-Value [Acc > NIR] : 0.9426
##
## Kappa : 0.0476
## Mcnemar's Test P-Value : 1.227e-10
##
## Sensitivity : 0.98553
## Specificity : 0.04545
## Pos Pred Value : 0.93588
## Neg Pred Value : 0.18182
## Prevalence : 0.93393
## Detection Rate : 0.92042
## Detection Prevalence : 0.98348
## Balanced Accuracy : 0.51549
##
## 'Positive' Class : 0
##
Fig. 4.4: Significant model features after class balancing, ordered most to least signicant from left to right.
After running the tuning data against the model, accuracy, TPR and TNR improve slightly. F1 score = 0.9664.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 144 8
## 1 2 3
##
## Accuracy : 0.9363
## 95% CI : (0.886, 0.969)
## No Information Rate : 0.9299
## P-Value [Acc > NIR] : 0.4555
##
## Kappa : 0.3464
## Mcnemar's Test P-Value : 0.1138
##
## Sensitivity : 0.9863
## Specificity : 0.2727
## Pos Pred Value : 0.9474
## Neg Pred Value : 0.6000
## Prevalence : 0.9299
## Detection Rate : 0.9172
## Detection Prevalence : 0.9682
## Balanced Accuracy : 0.6295
##
## 'Positive' Class : 0
##
For this model AUC = 0.9066. The ROC curve for the tuning dataset is plotted below.
Fig. 4.5: Receiver operating characteristic (ROC) curve for baseline iterative logistic model fit.
Fig. 4.6: TPR, TNR, accuracy and F1 score vs. threshold.
Setting the threshold to 0.1 increases TNR to 0.8182, but decreases accuracy to 0.8153, F1 score to 0.8914 and TPR to 0.8151.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 119 2
## 1 27 9
##
## Accuracy : 0.8153
## 95% CI : (0.7456, 0.8727)
## No Information Rate : 0.9299
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.3088
## Mcnemar's Test P-Value : 8.324e-06
##
## Sensitivity : 0.8151
## Specificity : 0.8182
## Pos Pred Value : 0.9835
## Neg Pred Value : 0.2500
## Prevalence : 0.9299
## Detection Rate : 0.7580
## Detection Prevalence : 0.7707
## Balanced Accuracy : 0.8166
##
## 'Positive' Class : 0
##
Test data results below show accuracy = 0.5128, F1 score = 0.6481, TPR = 0.4795 and TNR = 1.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 35 0
## 1 38 5
##
## Accuracy : 0.5128
## 95% CI : (0.3969, 0.6277)
## No Information Rate : 0.9359
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.1056
## Mcnemar's Test P-Value : 1.947e-09
##
## Sensitivity : 0.4795
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 0.1163
## Prevalence : 0.9359
## Detection Rate : 0.4487
## Detection Prevalence : 0.4487
## Balanced Accuracy : 0.7397
##
## 'Positive' Class : 0
##
| Model | Stage | TPR | TNR | Accuracy | F1-Score |
|---|---|---|---|---|---|
| iterated logistic | train | 0.9855 | 0.0455 | 0.9234 | 0.9601 |
| iterated logistic | tune | 0.8151 | 0.8182 | 0.8153 | 0.8914 |
| iterated logistic | test | 0.4795 | 1.0000 | 0.5128 | 0.6481 |
Fig. 4.7: TPR, TNR, accuracy and F1 score vs. model fit stage
A notable characteristic of the SECOM dataset mentioned at the outset of this project was the relatively few number of failure observations compared with passing - less than 1 in 10. This dataset characteristic is known as an imbalanced classification problem and is problematic in machine learning algorithms since they’re designed to improve accuracy by minimizing fit error but cannot account for the class distribution. This tends to lead to bias toward the majority class and yields an accurate model with poor discernability to predict minority class observations. Dealing with imbalanced datasets entails strategies for balancing classes in the training data prior to using any machine learning algorithm. One strategy is to use sampling techniques to balance the original dataset. There are a number of resampling techniques in the literature including:
Another approach to dealing with the class imbalance entails algorithm ensemble techniques to improve the performance of single classifiers of existing classification algorithms. Some examples of techniques in the literature are listed below. These techniques are advanced and a little beyond scope but would be of interest for future development work.
For this work the number of failure observations is too small to consider undersampling, so oversampling is required. SMOTE was chosen as it helps mitigate over fitting errors associated with using multiple copies of identical observations as occurs in statistical oversampling, and the technique is readily implemented using the smotefamily package. Table 4.4 below shows the original and balanced training data results. SMOTE is performed only on training data to avoid “leakage” of information into the tune/test data producing misleading results. In order to maintain the predictive integrity of the tuning and test datasets then, those datasets are not rebalanced. Fig. 4.8 overlays training dataset density plots of failure observations before and after rebalancing for a sample of feature distributions and shows that the synthesized data characteristics are consistent with the original as desired.
| Observations | Pass | Fail | |
|---|---|---|---|
| Original | 1332 | 1244 | 88 |
| Balanced | 2476 | 1244 | 1232 |
Fig. 4.8: Example distributions comparing original and synthesized of failure observation distributions.
Now that the training dataset is balanced the iterative logistic model is refit. Please refer to Appendix B for detailed model output. The AIC score of 457 is a further improvement with accuracy = 0.998, F1 score = 0.998, TPR = 0.996 and TNR = 1. Confusion matrix results and a plot of most to least significant model features are shown below. The number of significant features has substantially increased to 139 as shown in Fig. 4.9.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1239 0
## 1 5 1232
##
## Accuracy : 0.998
## 95% CI : (0.9953, 0.9993)
## No Information Rate : 0.5024
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.996
## Mcnemar's Test P-Value : 0.07364
##
## Sensitivity : 0.9960
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 0.9960
## Prevalence : 0.5024
## Detection Rate : 0.5004
## Detection Prevalence : 0.5004
## Balanced Accuracy : 0.9980
##
## 'Positive' Class : 0
##
Fig. 4.9: Significant model features after class balancing, ordered most to least signicant from left to right.
After running the tuning data against the model, accuracy, TPR and TNR all decrease. F1 score = 0.998.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 120 4
## 1 26 7
##
## Accuracy : 0.8089
## 95% CI : (0.7386, 0.8672)
## No Information Rate : 0.9299
## P-Value [Acc > NIR] : 1.000000
##
## Kappa : 0.2381
## Mcnemar's Test P-Value : 0.000126
##
## Sensitivity : 0.8219
## Specificity : 0.6364
## Pos Pred Value : 0.9677
## Neg Pred Value : 0.2121
## Prevalence : 0.9299
## Detection Rate : 0.7643
## Detection Prevalence : 0.7898
## Balanced Accuracy : 0.7291
##
## 'Positive' Class : 0
##
The new AUC = 0.8207 is a decreased showing poorer pass/fail discernability. The ROC curve for the tuning dataset is plotted below.
Fig. 4.10: Receiver operating characteristic (ROC) curve for class balanced, iterative logistic model fit.
Fig. 4.11: TPR, TNR, accuracy and F1 score vs. threshold.
Setting the threshold to 0.0174 increases TNR to 0.8182, but decreases accuracy to 0.7707 and TPR to 0.7671. F1 score = 0.8615.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 112 2
## 1 34 9
##
## Accuracy : 0.7707
## 95% CI : (0.697, 0.8339)
## No Information Rate : 0.9299
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.2496
## Mcnemar's Test P-Value : 2.383e-07
##
## Sensitivity : 0.7671
## Specificity : 0.8182
## Pos Pred Value : 0.9825
## Neg Pred Value : 0.2093
## Prevalence : 0.9299
## Detection Rate : 0.7134
## Detection Prevalence : 0.7261
## Balanced Accuracy : 0.7927
##
## 'Positive' Class : 0
##
Test data results below show accuracy = 0.859, TPR = 0.863, TNR = 0.8 and F1 score = 0.9197
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 63 1
## 1 10 4
##
## Accuracy : 0.859
## 95% CI : (0.7617, 0.9274)
## No Information Rate : 0.9359
## P-Value [Acc > NIR] : 0.99604
##
## Kappa : 0.3607
## Mcnemar's Test P-Value : 0.01586
##
## Sensitivity : 0.8630
## Specificity : 0.8000
## Pos Pred Value : 0.9844
## Neg Pred Value : 0.2857
## Prevalence : 0.9359
## Detection Rate : 0.8077
## Detection Prevalence : 0.8205
## Balanced Accuracy : 0.8315
##
## 'Positive' Class : 0
##
| Model | Stage | TPR | TNR | Accuracy | F1-Score |
|---|---|---|---|---|---|
| iterated logistic (SMOTE) | train | 0.9960 | 1.0000 | 0.9980 | 0.9980 |
| iterated logistic (SMOTE) | tune | 0.7671 | 0.8182 | 0.7707 | 0.8615 |
| iterated logistic (SMOTE) | test | 0.8630 | 0.8000 | 0.8590 | 0.9197 |
Fig. 4.12: TPR, TNR, accuracy and F1 score vs. model fit stage
In this last model a random forest classification fit is performed. In short, random forest is a machine learning technique which builds multiple decision trees and merges them together to obtain a more accurate and stable prediction. It’s very susceptible to over fitting due to class imbalance which was a primary reason for rebalancing the SECOM dataset originally. The randomForest package was used for this project.
The random forest model contains a number of tuning variables to tweak the model of which the primary ones considered here were ‘cutoff’, ‘ntree’, ‘mtry’, ‘nodesize’ and ‘maxit’. The approach taken was to build a parameter grid to measure the model sensitivity for TPR, TNR, accuracy and F1 score. Initially max iterations (maxit) was found to be insensitive so the remaining results assume maxit = 10 for the sake of speed. The plots below show TPR, TNR, accuracy and F1 score versus cutoff with ntree as the parameter where ntree is the number of trees to grow in the forest for the training data. The ‘cutoff’ parameter functions similar to the threshold for earlier logistic models. The plots are faceted by ‘mtry’, the number of variables randomly sampled as candidates at each split, and ‘nodesize’, the minimum size of terminal nodes. Results for the training data are shown below where it’s seen that the number of trees (ntree) has little effect for ntree > 100 due to the ‘cutoff’ settings restricting over fitting. The training results seem to suggest a wide range of values for ‘cutoff’, ‘mtry’ and ‘nodesize’ for obtaining a perfect fit.
Fig. 4.13 Training data True Positive Rate (TPR), True Negative Rate (TNR), Accuracy (ACC) and F1 Score (F1) vs. cutoff coarse grid plots.
Results for the tuning dataset in Fig. 4.14 below exhibit significantly more dependence on ‘cutoff’, ‘mtry’ and ‘nodesize’. The most significant dependence change is with TNR which is now inversely proportional to ‘cutoff’ and significantly decreases the model’s efficacy. Similar to the method used to identify the optimal threshold value for the earlier logistic models, Fig. 4.15 shows the relationship for TNR, TPR, accuracy and F1 score versus ‘cutoff’ overlaid on the same plots and faceted again by ‘mtry’ and ‘nodesize’. Using the magnified plot on the right side, the optimal model parameters maximize the intersection of these 4 measures which are listed below.
Fig. 4.14 Tuning data True Positive Rate (TPR), True Negative Rate (TNR), Accuracy (ACC) and F1 Score (F1) vs. cutoff coarse grid plots.
Fig. 4.15 Tuning data True Positive Rate (TPR), True Negative Rate (TNR), Accuracy (ACC) and F1 Score (F1) vs. cutoff overlay plots.
Training data results below show accuracy = 1, TPR = 1, TNR = 1 and F1 score = 1
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1244 0
## 1 0 1232
##
## Accuracy : 1
## 95% CI : (0.9985, 1)
## No Information Rate : 0.5024
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0000
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 1.0000
## Prevalence : 0.5024
## Detection Rate : 0.5024
## Detection Prevalence : 0.5024
## Balanced Accuracy : 1.0000
##
## 'Positive' Class : 0
##
Tuning data results below show accuracy = 0.7962, TPR = 0.7945, TNR = 0.8182 and F1 score = 0.8788
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 116 2
## 1 30 9
##
## Accuracy : 0.7962
## 95% CI : (0.7246, 0.8562)
## No Information Rate : 0.9299
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.2815
## Mcnemar's Test P-Value : 1.815e-06
##
## Sensitivity : 0.7945
## Specificity : 0.8182
## Pos Pred Value : 0.9831
## Neg Pred Value : 0.2308
## Prevalence : 0.9299
## Detection Rate : 0.7389
## Detection Prevalence : 0.7516
## Balanced Accuracy : 0.8064
##
## 'Positive' Class : 0
##
Test data results below show accuracy = 0.7821, TPR = 0.7808, TNR = 0.8 and F1 score = 0.8702
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 57 1
## 1 16 4
##
## Accuracy : 0.7821
## 95% CI : (0.6741, 0.8676)
## No Information Rate : 0.9359
## P-Value [Acc > NIR] : 0.999998
##
## Kappa : 0.2423
## Mcnemar's Test P-Value : 0.000685
##
## Sensitivity : 0.7808
## Specificity : 0.8000
## Pos Pred Value : 0.9828
## Neg Pred Value : 0.2000
## Prevalence : 0.9359
## Detection Rate : 0.7308
## Detection Prevalence : 0.7436
## Balanced Accuracy : 0.7904
##
## 'Positive' Class : 0
##
| Model | Stage | TPR | TNR | Accuracy | F1-Score |
|---|---|---|---|---|---|
| random forest | train | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| random forest | tune | 0.7945 | 0.8182 | 0.7962 | 0.8788 |
| random forest | test | 0.7808 | 0.8000 | 0.7821 | 0.8702 |
Fig. 4.16: TPR, TNR, accuracy and F1 score vs. model fit stage
Model summaries are provide below and grouped by the modeling stage, either train, tune or test. The test stage is considered the final model performance.
Table 4.7 summarizes TPR, TNR, accuracy and F1 score for the models considered in this project at training stage. All models score very high but random forest results suggest a perfect fit. This is due to the nature of random forest to over fit the data and why unseen data sets are then used for tuning and final test results.
| Model | Stage | TPR | TNR | Accuracy | F1-Score |
|---|---|---|---|---|---|
| logistic | train | 0.9992 | 0.8977 | 0.9925 | 0.9960 |
| iterated logistic | train | 0.9855 | 0.0455 | 0.9234 | 0.9601 |
| iterated logistic (SMOTE) | train | 0.9960 | 1.0000 | 0.9980 | 0.9980 |
| random forest | train | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Table 4.8 summarizes TPR, TNR, accuracy and F1 score for the models considered in this project at tuning stage. All model performances decay nearly the same amount but results suggest that the iterated logistic model provides the best overall performance with random forest right behind.
| Model | Stage | TPR | TNR | Accuracy | F1-Score |
|---|---|---|---|---|---|
| logistic | tune | 0.7877 | 0.8182 | 0.7898 | 0.8745 |
| iterated logistic | tune | 0.8151 | 0.8182 | 0.8153 | 0.8914 |
| iterated logistic (SMOTE) | tune | 0.7671 | 0.8182 | 0.7707 | 0.8615 |
| random forest | tune | 0.7945 | 0.8182 | 0.7962 | 0.8788 |
Table 4.9 summarizes TPR, TNR, accuracy and F1 score for the models considered in this project at the final testing stage. There was a notable shift in the iterated logistic model due to threshold sensitivity. The remaining three models remained comparably stable. The baseline logistic and class balanced iterative logistic model appear to perform identically and exhibit ~10% overall performance boost compared with tuning data results. Between these two the balanced iterative logistic model is preferred as it has fewer features and the lower AIC score. However, the random forest model results are not far behind and in fact have less variability comparing tuning with test results. Random forest is therefore a very good alternative model.
| Model | Stage | TPR | TNR | Accuracy | F1-Score |
|---|---|---|---|---|---|
| logistic | test | 0.8630 | 0.8 | 0.8590 | 0.9197 |
| iterated logistic | test | 0.4795 | 1.0 | 0.5128 | 0.6481 |
| iterated logistic (SMOTE) | test | 0.8630 | 0.8 | 0.8590 | 0.9197 |
| random forest | test | 0.7808 | 0.8 | 0.7821 | 0.8702 |
Fig. 4.17: TPR, TNR, accuracy and F1 score vs. model.
Fig. A.1: Initial variable density and QQ plots.
Fig. B.1: Final transformed variable density and QQ plots.
##
## Call:
## bayesglm(formula = Status ~ ., family = "binomial", data = trainData,
## maxit = 200)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.28232 -0.12155 -0.02053 -0.00224 1.99066
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.0626331 0.8107630 -8.711 < 2e-16 ***
## V1 -0.2609767 0.2954170 -0.883 0.37701
## V2 -0.2903392 0.2861980 -1.014 0.31036
## V3 -0.3216495 0.4475332 -0.719 0.47232
## V4 -0.0416842 0.5799741 -0.072 0.94270
## V5 0.0657829 0.4475007 0.147 0.88313
## V7 -0.1858288 0.4590828 -0.405 0.68564
## V8 0.4945563 0.4419088 1.119 0.26308
## V9 -0.0367219 0.3390667 -0.108 0.91376
## V10 -0.1767529 0.2893000 -0.611 0.54122
## V11 -0.3726488 0.2985720 -1.248 0.21199
## V12 0.0418189 0.6990001 0.060 0.95229
## V13 -0.0990936 0.7973638 -0.124 0.90110
## V15 -1.4565009 0.6830570 -2.132 0.03298 *
## V16 -0.0338828 0.3488698 -0.097 0.92263
## V17 0.1347087 0.3175562 0.424 0.67142
## V18 -0.1564667 0.7916692 -0.198 0.84333
## V19 -0.0822949 0.7228763 -0.114 0.90936
## V20 0.5903551 0.7538726 0.783 0.43357
## V21 0.2227250 0.3419533 0.651 0.51483
## V22 0.1735002 0.5114352 0.339 0.73443
## V23 -0.1982178 0.5508594 -0.360 0.71897
## V24 0.2420678 0.3688182 0.656 0.51161
## V25 -0.0959748 0.3081243 -0.311 0.75544
## V26 0.4679630 0.8898115 0.526 0.59895
## V27 -0.8255856 0.6849044 -1.205 0.22805
## V28 0.3489043 0.8124104 0.429 0.66758
## V29 -0.7793114 0.4784178 -1.629 0.10333
## V30 0.5362285 0.6003993 0.893 0.37179
## V31 -0.5604755 0.6282706 -0.892 0.37234
## V32 -0.2243625 0.6225837 -0.360 0.71857
## V33 0.4759381 0.5109881 0.931 0.35164
## V34 0.3263706 0.4835872 0.675 0.49974
## V35 -0.1851977 0.8672536 -0.214 0.83090
## V36 -0.5326200 1.0324499 -0.516 0.60594
## V37 0.2595841 0.8851638 0.293 0.76932
## V38 -0.1476146 0.3794146 -0.389 0.69723
## V39 0.0887534 0.3251469 0.273 0.78488
## V40 0.2584545 0.4871415 0.531 0.59573
## V41 -0.6067620 0.2984003 -2.033 0.04201 *
## V42 -0.3785422 0.2967943 -1.275 0.20215
## V44 0.0768621 0.8145574 0.094 0.92482
## V45 -0.6263321 0.4730331 -1.324 0.18548
## V46 0.3827740 0.9032605 0.424 0.67173
## V47 0.1611046 0.9596048 0.168 0.86667
## V48 0.1287432 0.5949389 0.216 0.82868
## V49 -0.3280050 0.6595824 -0.497 0.61898
## V51 0.2220980 0.9333681 0.238 0.81192
## V52 0.2285163 0.7310127 0.313 0.75458
## V54 -0.9457464 0.9004829 -1.050 0.29360
## V55 -0.5242772 0.8708652 -0.602 0.54716
## V56 0.0903495 0.4931756 0.183 0.85464
## V57 1.9962867 0.8004055 2.494 0.01263 *
## V58 -0.3906488 0.4244126 -0.920 0.35734
## V59 -0.1816207 0.4266713 -0.426 0.67035
## V60 0.2479856 0.4797081 0.517 0.60519
## V61 0.5950351 0.7632475 0.780 0.43562
## V62 0.3274722 0.6171333 0.531 0.59567
## V63 -1.3817837 0.9876608 -1.399 0.16180
## V64 0.2292920 0.7927033 0.289 0.77239
## V65 -0.0202961 0.7117847 -0.029 0.97725
## V66 -0.6251813 0.7458192 -0.838 0.40189
## V67 1.4432582 0.9567874 1.508 0.13144
## V68 0.8236013 0.4066536 2.025 0.04284 *
## V69 -0.1136487 0.5467826 -0.208 0.83535
## V71 -0.6795923 1.0398890 -0.654 0.51342
## V72 0.3790642 0.4189382 0.905 0.36556
## V76 0.1720724 0.3791041 0.454 0.64991
## V77 -0.3074620 0.4393680 -0.700 0.48406
## V78 0.4136704 0.3098127 1.335 0.18180
## V79 0.1767349 0.4124935 0.428 0.66832
## V80 0.2674348 0.3755517 0.712 0.47640
## V81 -0.3436699 0.3425937 -1.003 0.31579
## V82 0.2318527 0.4546468 0.510 0.61008
## V83 -0.2744445 0.3587724 -0.765 0.44430
## V84 -0.0647604 0.3446708 -0.188 0.85096
## V85 -0.2077407 0.2952136 -0.704 0.48162
## V87 0.3866824 0.3217645 1.202 0.22946
## V88 0.7068155 0.3977482 1.777 0.07556 .
## V89 -0.1963311 0.3205191 -0.613 0.54018
## V90 0.0147664 0.3512788 0.042 0.96647
## V91 -0.0035475 0.3894200 -0.009 0.99273
## V92 -0.1106835 0.3962985 -0.279 0.78002
## V93 -0.4749683 0.7153539 -0.664 0.50671
## V94 -0.1179139 0.7125461 -0.165 0.86856
## V95 -0.6089097 0.7171162 -0.849 0.39582
## V97 0.0393488 0.7868559 0.050 0.96012
## V99 0.0809440 0.7842281 0.103 0.91779
## V100 0.5569485 0.7002325 0.795 0.42639
## V101 0.6049213 0.5090242 1.188 0.23468
## V102 -0.1955910 0.6176890 -0.317 0.75151
## V103 0.1759080 0.3678421 0.478 0.63250
## V104 0.1819085 0.3996908 0.455 0.64902
## V105 0.2415146 0.6872707 0.351 0.72528
## V106 -0.2131306 0.7108001 -0.300 0.76429
## V107 -0.0735238 0.7246127 -0.101 0.91918
## V108 0.1775315 0.3233717 0.549 0.58300
## V109 -0.8231233 0.3360476 -2.449 0.01431 *
## V114 0.0911131 0.3551052 0.257 0.79750
## V115 0.5033300 0.1918844 2.623 0.00871 **
## V116 -0.2033341 0.4001198 -0.508 0.61132
## V117 -0.3328822 0.5144986 -0.647 0.51763
## V118 -0.4976497 0.3004454 -1.656 0.09765 .
## V119 0.2434761 0.3333034 0.730 0.46509
## V120 -0.3168095 0.4364563 -0.726 0.46792
## V121 -0.2069706 0.3198155 -0.647 0.51753
## V122 0.3613246 0.8064854 0.448 0.65414
## V123 -0.8969228 0.8429493 -1.064 0.28732
## V124 0.3627920 0.8306660 0.437 0.66229
## V125 -1.0045048 0.6700535 -1.499 0.13384
## V126 0.3706037 0.4453570 0.832 0.40532
## V127 0.2918920 0.3910483 0.746 0.45540
## V128 -0.2955692 0.7255705 -0.407 0.68374
## V129 -0.0729250 0.4110039 -0.177 0.85917
## V130 1.0358576 0.4638694 2.233 0.02554 *
## V131 0.7490484 0.6013185 1.246 0.21288
## V132 0.3553873 0.3570113 0.995 0.31952
## V133 -2.1659604 0.5499728 -3.938 8.21e-05 ***
## V134 0.5054298 0.5185429 0.975 0.32970
## V135 -0.7102546 0.4265574 -1.665 0.09590 .
## V136 0.2064456 0.8013874 0.258 0.79671
## V137 -0.2075425 0.7519885 -0.276 0.78255
## V138 1.0138627 0.7541961 1.344 0.17885
## V139 0.2469827 0.8256848 0.299 0.76484
## V140 0.3580337 0.7628672 0.469 0.63884
## V141 -0.0679244 0.7581838 -0.090 0.92861
## V143 -0.6212964 0.8198446 -0.758 0.44856
## V144 -0.2580270 0.7963352 -0.324 0.74592
## V145 -0.1123243 0.8378477 -0.134 0.89335
## V146 -0.1369243 0.6270489 -0.218 0.82715
## V147 0.0582829 0.5808889 0.100 0.92008
## V148 0.0456608 0.9255044 0.049 0.96065
## V149 0.0537805 0.8994106 0.060 0.95232
## V151 0.1431288 0.7970741 0.180 0.85749
## V152 -0.6601071 0.9318416 -0.708 0.47870
## V153 0.4809584 0.9155067 0.525 0.59934
## V154 -0.3974150 0.9582592 -0.415 0.67834
## V155 0.1266631 0.6636278 0.191 0.84863
## V156 -0.4044974 0.8681858 -0.466 0.64128
## V157 -0.1939776 0.8389382 -0.231 0.81715
## V160 0.0624372 0.8742598 0.071 0.94307
## V161 0.0251433 0.8718715 0.029 0.97699
## V162 -0.4504576 0.8687623 -0.519 0.60411
## V163 -0.1638706 0.7955344 -0.206 0.83680
## V164 0.1164405 0.8456100 0.138 0.89048
## V165 -1.0321385 0.9877975 -1.045 0.29607
## V166 -0.1902315 0.8370845 -0.227 0.82023
## V167 -0.1632829 0.8203727 -0.199 0.84224
## V168 -0.0603130 0.7684046 -0.078 0.93744
## V169 0.3626414 0.7948488 0.456 0.64822
## V170 -0.1467362 0.8390657 -0.175 0.86117
## V171 0.5007822 0.9031328 0.554 0.57924
## V172 -0.1397866 0.8546297 -0.164 0.87007
## V173 0.2406598 1.0178230 0.236 0.81309
## V174 0.0554251 0.8215546 0.067 0.94621
## V175 0.2420646 1.0181949 0.238 0.81208
## V176 -0.3562847 0.8497749 -0.419 0.67502
## V177 0.3141451 0.8823361 0.356 0.72181
## V178 -0.2510331 0.9101822 -0.276 0.78270
## V181 0.4237619 0.8416764 0.503 0.61463
## V182 0.0128014 0.8378240 0.015 0.98781
## V183 -1.2097130 0.9002468 -1.344 0.17903
## V184 0.5487956 0.9695541 0.566 0.57137
## V185 0.0418883 0.8209145 0.051 0.95930
## V186 -0.0968686 0.8217339 -0.118 0.90616
## V188 -0.0760975 0.7667286 -0.099 0.92094
## V189 0.1259775 0.8075721 0.156 0.87604
## V196 0.3516633 0.7992957 0.440 0.65996
## V197 0.3660360 0.8758214 0.418 0.67599
## V198 0.2711449 0.8655990 0.313 0.75409
## V199 0.0474604 0.8617586 0.055 0.95608
## V200 0.0698766 0.8876045 0.079 0.93725
## V201 0.2502637 0.5695373 0.439 0.66036
## V202 0.6204552 0.9754446 0.636 0.52473
## V203 -0.0802787 0.8605530 -0.093 0.92568
## V204 -0.2676871 0.8890464 -0.301 0.76334
## V205 0.2321925 0.8408528 0.276 0.78244
## V206 0.5393198 0.9007928 0.599 0.54936
## V208 0.1871443 0.8684013 0.216 0.82937
## V209 0.4511181 0.7327818 0.616 0.53814
## V211 -0.9716241 0.7100798 -1.368 0.17121
## V212 0.5519825 0.4404887 1.253 0.21016
## V213 -0.4212089 0.7278399 -0.579 0.56278
## V214 0.0435710 0.7231859 0.060 0.95196
## V215 -0.0544460 0.5092389 -0.107 0.91486
## V216 0.2423296 0.6786008 0.357 0.72102
## V217 3.2507181 1.0879851 2.988 0.00281 **
## V218 0.1464191 0.6500678 0.225 0.82180
## V219 -0.0476824 0.7594367 -0.063 0.94994
## V220 -1.0055631 0.6294567 -1.598 0.11015
## V222 -0.3513815 0.9001673 -0.390 0.69628
## V223 0.0167196 0.9007372 0.019 0.98519
## V224 -0.1323140 0.8279427 -0.160 0.87303
## V225 0.0211339 0.8773004 0.024 0.98078
## V226 -0.2364156 0.7874945 -0.300 0.76402
## V228 0.1306615 0.6463882 0.202 0.83981
## V229 -0.3116533 0.6317764 -0.493 0.62180
## V239 0.1098125 0.6570780 0.167 0.86727
## V240 0.7469281 0.6467355 1.155 0.24812
## V249 -0.3482222 0.8730697 -0.399 0.69001
## V250 -0.5669784 0.3493426 -1.623 0.10459
## V251 -0.5728485 0.8020613 -0.714 0.47509
## V252 -0.6338811 0.9751506 -0.650 0.51567
## V253 -0.4095029 0.8615393 -0.475 0.63456
## V254 0.2984959 0.8444356 0.353 0.72372
## V255 0.0990940 0.8715111 0.114 0.90947
## V256 -0.1432898 0.8490343 -0.169 0.86598
## V268 -0.1296815 0.8782538 -0.148 0.88261
## V269 -1.5772773 1.0083975 -1.564 0.11778
## V270 0.2718801 0.8110513 0.335 0.73746
## V271 -0.5744718 0.5722478 -1.004 0.31543
## V272 0.6093143 0.7387112 0.825 0.40947
## V273 -0.9582791 0.8099712 -1.183 0.23677
## V274 0.1681029 0.6715356 0.250 0.80234
## V275 0.0842719 0.6113364 0.138 0.89036
## V276 0.2106246 0.5761745 0.366 0.71470
## V278 0.4074366 0.7389060 0.551 0.58136
## V279 -0.4013922 0.6384631 -0.629 0.52956
## V280 -0.0036597 0.7208223 -0.005 0.99595
## V281 -0.9490575 0.6349204 -1.495 0.13498
## V282 0.0170613 0.5763120 0.030 0.97638
## V283 0.3421478 0.9196238 0.372 0.70985
## V284 -0.2759614 0.8208019 -0.336 0.73671
## V286 0.7234695 0.7594023 0.953 0.34075
## V287 0.7148238 0.8354096 0.856 0.39219
## V288 0.1551009 0.8322914 0.186 0.85217
## V289 -0.3398170 0.9435200 -0.360 0.71873
## V290 -0.1912798 0.8356470 -0.229 0.81895
## V291 0.4685027 0.7838885 0.598 0.55006
## V292 0.5700700 0.7619434 0.748 0.45435
## V295 -0.0079636 0.8616872 -0.009 0.99263
## V296 0.0442099 0.8569144 0.052 0.95885
## V297 0.2568951 0.8444756 0.304 0.76097
## V298 0.2011221 0.7910273 0.254 0.79930
## V299 0.0215532 0.8298919 0.026 0.97928
## V300 0.3046830 0.8822240 0.345 0.72983
## V301 -0.0146151 0.8295470 -0.018 0.98594
## V302 0.4776934 0.6683009 0.715 0.47474
## V303 0.6809850 0.6968230 0.977 0.32843
## V304 -0.9108397 0.7197989 -1.265 0.20572
## V305 -0.0488861 0.7568244 -0.065 0.94850
## V306 -1.0521515 0.8066174 -1.304 0.19210
## V307 0.2182173 0.8000417 0.273 0.78504
## V308 -0.2395696 0.8931291 -0.268 0.78852
## V309 -0.2620820 0.7136087 -0.367 0.71342
## V310 -0.2375481 0.8931702 -0.266 0.79027
## V311 0.8225380 0.7300528 1.127 0.25988
## V312 -0.8400933 0.8098289 -1.037 0.29956
## V313 -0.3670892 0.9168726 -0.400 0.68888
## V317 -0.6094452 0.6067974 -1.004 0.31520
## V318 -0.8490647 0.7520006 -1.129 0.25887
## V319 0.3502580 0.8214065 0.426 0.66981
## V320 0.5145701 0.9536299 0.540 0.58948
## V321 -0.1150786 0.7157259 -0.161 0.87226
## V322 -0.1618136 0.6916551 -0.234 0.81502
## V324 -0.0452592 0.7333905 -0.062 0.95079
## V325 -0.2251048 0.7547548 -0.298 0.76551
## V332 0.1389524 0.7400890 0.188 0.85107
## V333 -0.2804245 0.8547496 -0.328 0.74285
## V334 0.2414320 0.7390197 0.327 0.74390
## V335 -0.1753587 0.8214179 -0.213 0.83095
## V336 -0.1922340 0.8295277 -0.232 0.81674
## V337 0.6882533 0.8522566 0.808 0.41934
## V338 -0.2576766 0.8795365 -0.293 0.76955
## V339 -0.7201586 0.8559546 -0.841 0.40015
## V340 0.1159822 0.8590721 0.135 0.89260
## V341 -0.2054277 0.7725027 -0.266 0.79030
## V342 0.0459951 0.7560682 0.061 0.95149
## V344 0.1570565 0.6561023 0.239 0.81081
## V345 -0.5185916 0.6972156 -0.744 0.45700
## V349 0.9882228 0.7146640 1.383 0.16673
## V350 0.1266279 0.4847641 0.261 0.79393
## V351 -0.0056791 0.7357563 -0.008 0.99384
## V352 -0.1686933 0.7193999 -0.234 0.81460
## V353 -0.2950797 0.5133616 -0.575 0.56543
## V354 -0.3591158 0.6932659 -0.518 0.60445
## V355 -3.2661052 1.1757497 -2.778 0.00547 **
## V356 0.0003764 0.5690591 0.001 0.99947
## V357 0.1047235 0.5854455 0.179 0.85803
## V358 0.5828071 0.7463246 0.781 0.43486
## V360 0.6073724 0.8300029 0.732 0.46431
## V361 0.4284109 0.8853853 0.484 0.62848
## V362 0.3807335 0.6762052 0.563 0.57340
## V363 -0.2136700 0.8185589 -0.261 0.79407
## V364 -0.2983575 0.6952909 -0.429 0.66784
## V366 0.4233955 0.6691615 0.633 0.52691
## V367 -0.5600137 0.6294712 -0.890 0.37365
## V368 0.0170830 0.4293518 0.040 0.96826
## V369 0.1445310 0.4030194 0.359 0.71988
## V377 -0.1601655 0.6285993 -0.255 0.79888
## V378 -0.4467912 0.6655744 -0.671 0.50204
## V387 0.0045248 0.7714592 0.006 0.99532
## V388 0.4236747 0.4303390 0.985 0.32486
## V389 0.8965211 0.7084275 1.266 0.20569
## V390 0.7067034 0.9144058 0.773 0.43961
## V391 -1.0428392 0.8168360 -1.277 0.20171
## V392 -0.5271247 0.6910804 -0.763 0.44561
## V393 -0.2738063 0.7886800 -0.347 0.72846
## V394 0.0541928 0.7951216 0.068 0.94566
## V406 -0.0765801 0.8332361 -0.092 0.92677
## V407 1.0302275 0.8405433 1.226 0.22032
## V408 0.0292068 0.6464461 0.045 0.96396
## V409 0.0429627 0.7936154 0.054 0.95683
## V410 -0.3183001 0.7792050 -0.408 0.68291
## V411 -0.0911054 0.8038069 -0.113 0.90976
## V412 0.3615881 0.8346542 0.433 0.66486
## V413 0.2886662 0.7525343 0.384 0.70128
## V414 -0.2736236 0.6930940 -0.395 0.69300
## V416 -0.5325770 0.8482360 -0.628 0.53009
## V417 -0.1372962 0.7916574 -0.173 0.86231
## V418 -0.1411078 0.8316802 -0.170 0.86527
## V419 0.0889076 0.2845415 0.312 0.75469
## V420 -0.6732591 0.2844762 -2.367 0.01795 *
## V421 0.0077155 0.9387541 0.008 0.99344
## V422 0.0291804 0.9013069 0.032 0.97417
## V424 -0.1756274 0.6154521 -0.285 0.77537
## V425 -0.6540488 0.9244016 -0.708 0.47923
## V426 0.4791146 0.9271286 0.517 0.60531
## V427 -0.4022132 0.9627763 -0.418 0.67612
## V428 0.3173049 0.8279537 0.383 0.70154
## V429 -0.3369207 0.8689777 -0.388 0.69822
## V430 -0.2004042 0.8430396 -0.238 0.81210
## V431 0.6393326 0.9348711 0.684 0.49406
## V432 0.2728285 0.8693594 0.314 0.75365
## V433 -0.0435409 0.7461504 -0.058 0.95347
## V434 0.3133971 0.5518152 0.568 0.57008
## V435 0.3271572 0.9116531 0.359 0.71970
## V436 -0.3589872 0.9944363 -0.361 0.71810
## V437 -0.0511105 0.9125793 -0.056 0.95534
## V438 -0.0617266 0.8162692 -0.076 0.93972
## V439 -0.3757216 0.8459325 -0.444 0.65693
## V440 0.5266817 0.8377626 0.629 0.52956
## V441 -0.2224706 0.8733908 -0.255 0.79894
## V442 0.4864810 0.8963662 0.543 0.58732
## V443 -0.2299861 0.8472141 -0.271 0.78604
## V444 0.2108247 1.0188279 0.207 0.83607
## V445 0.0817558 0.8421653 0.097 0.92266
## V446 0.2594407 1.0090161 0.257 0.79708
## V447 -0.3595798 0.8506947 -0.423 0.67252
## V448 0.2870588 0.8837037 0.325 0.74531
## V449 -0.2946517 0.9081328 -0.324 0.74559
## V453 0.3697659 0.8299932 0.446 0.65596
## V454 0.0003640 0.8102090 0.000 0.99964
## V455 0.5166208 0.6704132 0.771 0.44094
## V456 0.5984086 0.9759750 0.613 0.53978
## V457 0.0322003 0.7788074 0.041 0.96702
## V458 -0.0684957 0.8469678 -0.081 0.93554
## V460 -0.3482403 0.8149084 -0.427 0.66913
## V461 0.7794605 0.6115443 1.275 0.20246
## V468 -0.9980197 0.8479269 -1.177 0.23919
## V469 -0.8820492 0.5209226 -1.693 0.09041 .
## V470 0.2341780 0.8605544 0.272 0.78553
## V471 0.0419922 0.8705161 0.048 0.96153
## V472 0.1546254 0.8687182 0.178 0.85873
## V473 -0.1365753 0.5873087 -0.233 0.81612
## V474 0.5511690 0.9270395 0.595 0.55215
## V475 0.2063927 0.8647824 0.239 0.81137
## V476 -0.3337944 0.8990638 -0.371 0.71044
## V477 -0.4550743 0.7718554 -0.590 0.55547
## V478 0.5104058 0.9056591 0.564 0.57304
## V480 0.1729619 0.8670065 0.199 0.84188
## V481 0.5638564 0.6917409 0.815 0.41500
## V483 0.3263200 0.3239205 1.007 0.31374
## V484 0.0019026 0.4135573 0.005 0.99633
## V485 -0.2183615 0.2739442 -0.797 0.42539
## V486 0.4634651 0.4093939 1.132 0.25760
## V487 -0.0690386 0.3147754 -0.219 0.82640
## V488 0.0536135 0.3487044 0.154 0.87781
## V489 0.2150336 0.5493566 0.391 0.69548
## V490 -0.0974553 0.3275109 -0.298 0.76604
## V491 -0.3797521 0.7487976 -0.507 0.61205
## V492 0.7100876 0.7682276 0.924 0.35532
## V494 -0.3891977 0.9006861 -0.432 0.66566
## V495 0.0333131 0.9061040 0.037 0.97067
## V496 -0.1464759 0.8266558 -0.177 0.85936
## V497 0.0439147 0.8117721 0.054 0.95686
## V498 -0.3517596 0.7189518 -0.489 0.62465
## V500 -0.2974814 0.2760422 -1.078 0.28118
## V501 0.4650392 0.2765706 1.681 0.09268 .
## V511 0.2129477 0.4603679 0.463 0.64368
## V512 0.0768153 0.2671875 0.287 0.77373
## V521 -0.3390011 0.8786811 -0.386 0.69964
## V523 0.5295379 0.8305609 0.638 0.52376
## V524 -0.4754707 0.9757804 -0.487 0.62607
## V525 1.2570773 0.9023092 1.393 0.16357
## V526 0.1736531 0.8285304 0.210 0.83399
## V527 0.0481081 0.8850489 0.054 0.95665
## V528 -0.1113389 0.8422341 -0.132 0.89483
## V540 -0.1066146 0.8662838 -0.123 0.90205
## V541 -0.0305466 0.7539879 -0.041 0.96768
## V542 0.2797984 0.7861142 0.356 0.72190
## V543 0.0730589 0.3856108 0.189 0.84973
## V544 -0.4653272 0.8263907 -0.563 0.57338
## V545 0.1761793 0.5269656 0.334 0.73813
## V546 -0.3116076 0.7955497 -0.392 0.69529
## V559 -0.0561961 0.3562770 -0.158 0.87467
## V560 -0.0444981 0.8270193 -0.054 0.95709
## V561 0.2269088 0.7303752 0.311 0.75605
## V562 -0.0345187 0.7936518 -0.043 0.96531
## V571 -0.1232187 0.3008473 -0.410 0.68212
## V572 -0.0702097 0.3875367 -0.181 0.85623
## V573 -0.0907062 0.8922732 -0.102 0.91903
## V574 -0.3158834 0.7683541 -0.411 0.68099
## V575 0.0356215 0.8706793 0.041 0.96737
## V576 0.0404858 0.6985058 0.058 0.95378
## V577 -0.0814236 0.8981351 -0.091 0.92776
## V578 -0.1274804 0.7568788 -0.168 0.86625
## V583 0.3913854 0.2935175 1.333 0.18239
## V584 -0.0704469 0.7963731 -0.088 0.92951
## V585 0.4069519 0.5856488 0.695 0.48713
## V586 -0.0841964 0.7922007 -0.106 0.91536
## V587 -0.2773683 0.4996178 -0.555 0.57878
## V588 0.4648039 0.7222175 0.644 0.51985
## V589 0.2610280 0.6882760 0.379 0.70450
## V590 -0.6848704 0.6098317 -1.123 0.26142
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 648.26 on 1331 degrees of freedom
## Residual deviance: 118.71 on 917 degrees of freedom
## AIC: 948.71
##
## Number of Fisher Scoring iterations: 41
##
## Call:
## bayesglm(formula = Status ~ ., family = "binomial", data = trainData,
## maxit = 200)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.56392 -0.33293 -0.19485 -0.09861 3.01216
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.55622 0.21713 -16.378 < 2e-16 ***
## V15 -0.34637 0.13397 -2.585 0.009729 **
## V23 -0.23428 0.09679 -2.421 0.015497 *
## V54 -0.51206 0.23218 -2.205 0.027421 *
## V57 1.15227 0.26356 4.372 1.23e-05 ***
## V63 -0.86547 0.22644 -3.822 0.000132 ***
## V67 0.77974 0.27749 2.810 0.004955 **
## V100 0.27718 0.09772 2.836 0.004563 **
## V103 -0.28223 0.12166 -2.320 0.020345 *
## V123 -0.58493 0.16876 -3.466 0.000528 ***
## V130 0.41991 0.17084 2.458 0.013974 *
## V133 -0.66215 0.15019 -4.409 1.04e-05 ***
## V134 0.31229 0.13926 2.243 0.024926 *
## V153 0.24604 0.11104 2.216 0.026704 *
## V206 0.44612 0.14326 3.114 0.001845 **
## V217 1.13694 0.41439 2.744 0.006076 **
## V282 -0.39938 0.13807 -2.893 0.003821 **
## V303 0.33562 0.13175 2.547 0.010854 *
## V320 0.29149 0.14497 2.011 0.044350 *
## V334 0.58583 0.15626 3.749 0.000177 ***
## V355 -1.18383 0.42903 -2.759 0.005793 **
## V420 -0.26984 0.12739 -2.118 0.034157 *
## V425 -0.40300 0.16584 -2.430 0.015099 *
## V469 -0.62795 0.13074 -4.803 1.56e-06 ***
## V483 0.26005 0.12377 2.101 0.035630 *
## V512 0.25882 0.12201 2.121 0.033904 *
## V542 0.36320 0.12448 2.918 0.003527 **
## V588 0.50233 0.15147 3.316 0.000912 ***
## V590 -0.34552 0.15839 -2.181 0.029153 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 648.26 on 1331 degrees of freedom
## Residual deviance: 457.26 on 1303 degrees of freedom
## AIC: 515.26
##
## Number of Fisher Scoring iterations: 12
##
## Call:
## bayesglm(formula = Status ~ ., family = "binomial", data = trainData,
## maxit = 200)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0305 -0.0014 0.0000 0.2014 0.7839
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -11.5000 0.7084 -16.233 < 2e-16 ***
## V1 -0.9920 0.2283 -4.345 1.39e-05 ***
## V2 -0.7869 0.2014 -3.908 9.31e-05 ***
## V3 -0.7597 0.2449 -3.102 0.001922 **
## V10 -0.5801 0.2215 -2.619 0.008816 **
## V11 -0.6344 0.2080 -3.050 0.002290 **
## V15 -2.7126 0.3374 -8.040 8.99e-16 ***
## V20 1.8696 0.6425 2.910 0.003617 **
## V24 0.6414 0.2300 2.788 0.005299 **
## V26 3.6851 0.5083 7.250 4.16e-13 ***
## V27 -2.9022 0.4666 -6.220 4.97e-10 ***
## V29 -1.6331 0.3349 -4.876 1.08e-06 ***
## V33 0.5546 0.1857 2.986 0.002828 **
## V41 -0.8894 0.2171 -4.097 4.18e-05 ***
## V45 -1.0037 0.3144 -3.192 0.001411 **
## V52 0.8990 0.2342 3.839 0.000124 ***
## V54 -3.9522 0.4761 -8.301 < 2e-16 ***
## V57 6.1085 0.6203 9.848 < 2e-16 ***
## V58 -0.6196 0.2530 -2.449 0.014345 *
## V61 2.3491 0.4624 5.081 3.76e-07 ***
## V63 -3.2111 0.6598 -4.867 1.14e-06 ***
## V67 5.5902 0.7990 6.996 2.63e-12 ***
## V68 2.1104 0.2671 7.902 2.75e-15 ***
## V71 -2.3889 0.7291 -3.277 0.001050 **
## V72 1.0369 0.2667 3.888 0.000101 ***
## V76 1.0069 0.2692 3.741 0.000183 ***
## V78 0.9658 0.2090 4.621 3.82e-06 ***
## V81 -0.7890 0.2365 -3.336 0.000849 ***
## V85 -0.6350 0.2343 -2.710 0.006738 **
## V87 0.8952 0.2369 3.779 0.000157 ***
## V88 1.5958 0.4609 3.462 0.000536 ***
## V89 -0.4653 0.2220 -2.096 0.036096 *
## V93 -0.7844 0.2384 -3.291 0.001000 **
## V95 -1.3690 0.2385 -5.741 9.44e-09 ***
## V101 1.1561 0.2996 3.859 0.000114 ***
## V104 0.6719 0.2723 2.468 0.013594 *
## V109 -1.4668 0.2418 -6.066 1.31e-09 ***
## V115 0.8329 0.1497 5.563 2.65e-08 ***
## V116 1.6407 0.4098 4.004 6.24e-05 ***
## V118 -0.7976 0.2345 -3.401 0.000671 ***
## V120 -0.4927 0.1942 -2.537 0.011187 *
## V121 -0.9117 0.2504 -3.641 0.000272 ***
## V122 2.0396 0.6688 3.050 0.002290 **
## V123 -2.8904 0.5594 -5.167 2.38e-07 ***
## V125 -2.2839 0.6019 -3.794 0.000148 ***
## V126 1.1032 0.3755 2.938 0.003300 **
## V130 2.1783 0.3773 5.773 7.77e-09 ***
## V131 1.4347 0.4744 3.024 0.002494 **
## V132 0.6337 0.2146 2.952 0.003153 **
## V133 -4.6272 0.4466 -10.362 < 2e-16 ***
## V134 1.0944 0.4457 2.455 0.014074 *
## V135 -1.1150 0.2929 -3.806 0.000141 ***
## V138 3.8786 0.7251 5.349 8.82e-08 ***
## V140 1.1417 0.2881 3.963 7.41e-05 ***
## V144 -1.5551 0.2591 -6.001 1.96e-09 ***
## V152 -4.7586 0.9987 -4.765 1.89e-06 ***
## V153 1.6906 0.2203 7.675 1.65e-14 ***
## V156 -1.3675 0.3018 -4.531 5.87e-06 ***
## V165 -1.5429 0.3111 -4.959 7.09e-07 ***
## V176 -2.7619 0.7254 -3.807 0.000140 ***
## V177 3.9399 0.9771 4.032 5.52e-05 ***
## V183 -7.4158 1.4798 -5.011 5.41e-07 ***
## V198 1.9679 0.4395 4.478 7.55e-06 ***
## V202 1.7584 0.5734 3.067 0.002163 **
## V206 1.9254 0.2942 6.544 5.98e-11 ***
## V211 -3.6233 0.7120 -5.089 3.60e-07 ***
## V212 1.0744 0.1747 6.149 7.80e-10 ***
## V213 -0.6228 0.2734 -2.278 0.022735 *
## V217 9.6575 1.0279 9.395 < 2e-16 ***
## V220 -2.9128 0.5358 -5.437 5.42e-08 ***
## V240 1.8563 0.6158 3.015 0.002573 **
## V250 -0.9985 0.2721 -3.670 0.000243 ***
## V251 -16.5879 2.4290 -6.829 8.54e-12 ***
## V252 -5.7323 1.3252 -4.326 1.52e-05 ***
## V269 -3.0183 0.7884 -3.829 0.000129 ***
## V270 1.1179 0.2448 4.566 4.96e-06 ***
## V272 0.7892 0.2943 2.681 0.007332 **
## V273 -3.8469 0.7758 -4.958 7.11e-07 ***
## V274 1.2668 0.2839 4.462 8.14e-06 ***
## V281 -2.1567 0.2592 -8.319 < 2e-16 ***
## V286 1.4054 0.2939 4.781 1.74e-06 ***
## V287 3.5078 1.0636 3.298 0.000974 ***
## V292 0.5515 0.2071 2.663 0.007733 **
## V303 4.2155 0.6314 6.677 2.44e-11 ***
## V304 -3.4862 0.5434 -6.416 1.40e-10 ***
## V311 2.8542 0.7219 3.953 7.70e-05 ***
## V312 -4.0998 0.9709 -4.223 2.41e-05 ***
## V313 -1.6263 0.3249 -5.005 5.57e-07 ***
## V318 -1.2054 0.3319 -3.632 0.000281 ***
## V320 2.6275 0.3504 7.499 6.43e-14 ***
## V322 -0.6225 0.2958 -2.104 0.035336 *
## V337 1.3291 0.5322 2.497 0.012519 *
## V339 -0.9122 0.4441 -2.054 0.039957 *
## V341 -0.8390 0.3107 -2.700 0.006925 **
## V349 3.5507 0.7363 4.822 1.42e-06 ***
## V354 -0.9627 0.2267 -4.247 2.17e-05 ***
## V355 -9.9337 1.0877 -9.133 < 2e-16 ***
## V358 3.4727 0.5807 5.980 2.23e-09 ***
## V361 0.8224 0.3367 2.442 0.014596 *
## V364 -2.7227 0.4860 -5.603 2.11e-08 ***
## V366 0.9925 0.2626 3.779 0.000157 ***
## V367 -1.8057 0.2682 -6.733 1.67e-11 ***
## V378 -1.2598 0.6304 -1.998 0.045685 *
## V388 0.8532 0.2916 2.926 0.003438 **
## V389 5.5477 0.8560 6.481 9.11e-11 ***
## V390 5.5490 1.3320 4.166 3.10e-05 ***
## V391 -4.5215 0.9269 -4.878 1.07e-06 ***
## V392 -0.5695 0.1928 -2.954 0.003140 **
## V407 2.4417 0.7662 3.187 0.001438 **
## V416 -1.4357 0.3002 -4.783 1.73e-06 ***
## V418 -1.0110 0.2525 -4.004 6.22e-05 ***
## V420 -1.5989 0.2098 -7.623 2.48e-14 ***
## V427 -0.9702 0.2365 -4.103 4.09e-05 ***
## V431 1.3424 0.2744 4.891 1.00e-06 ***
## V432 0.6757 0.2909 2.323 0.020166 *
## V435 1.0521 0.3347 3.143 0.001671 **
## V439 -3.8671 0.6622 -5.840 5.22e-09 ***
## V440 3.3945 0.5766 5.887 3.94e-09 ***
## V446 1.0568 0.2595 4.073 4.65e-05 ***
## V455 6.1353 1.3056 4.699 2.61e-06 ***
## V461 1.3360 0.2567 5.205 1.94e-07 ***
## V468 -1.0443 0.3149 -3.317 0.000911 ***
## V469 -2.8622 0.2990 -9.573 < 2e-16 ***
## V474 1.0867 0.4457 2.438 0.014763 *
## V481 1.7325 0.3663 4.730 2.25e-06 ***
## V485 -0.5810 0.2134 -2.722 0.006486 **
## V487 -0.6034 0.2146 -2.812 0.004931 **
## V489 1.2109 0.2865 4.227 2.37e-05 ***
## V491 -0.7923 0.2176 -3.641 0.000272 ***
## V500 -0.6167 0.2071 -2.978 0.002901 **
## V501 0.9741 0.2206 4.415 1.01e-05 ***
## V521 -1.6326 0.2396 -6.815 9.42e-12 ***
## V523 13.5183 2.1016 6.432 1.26e-10 ***
## V525 3.9293 0.8964 4.383 1.17e-05 ***
## V544 -0.9231 0.3280 -2.814 0.004885 **
## V561 0.5327 0.2168 2.457 0.013998 *
## V574 -1.1473 0.2761 -4.155 3.26e-05 ***
## V583 0.6014 0.2244 2.680 0.007356 **
## V585 0.4149 0.2025 2.048 0.040525 *
## V589 0.7340 0.2305 3.185 0.001450 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3432.4 on 2475 degrees of freedom
## Residual deviance: 177.1 on 2336 degrees of freedom
## AIC: 457.1
##
## Number of Fisher Scoring iterations: 34